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Abstract 

Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted 
stochastic games are classical models of game theory describing two-player zero-sum games of 
potentially infinite duration. We describe algorithms for exactly solving these games. When the 
number of positions of the game is constant, our algorithms run in polynomial time. 

1 Introduction 

Shapley's model of finite stochastic games [M] is a classical model of game theory describing two- 
player zero-sum games of (potentially) infinite duration. Such a game is given by a finite set of 
positions 1, . . . , A^, with a x reward matrix (afj) associated to each position k, and an x 
transition matrix (pfj) associated to each pair of positions k and /. The game is played in rounds, 
with some position k being the current position in each round. At each such round. Player I chooses 
an action i S {1,2,..., ruk} while simultaneously. Player II chooses an action j £ {1, 2, . . . , n^}, 
after which the (possibly negative) reward a^j is paid by Player II to Player I, and with probability 
the current position becomes I for the next round. 

During play of a stochastic game, a sequence of rewards is paid by Player II to Player I. There 
are three standard ways of associating a payoff to Player I from such a sequence, leading to three 
different variants of the stochastic game model: 

Shapley games. In Shapley's original paper, the payoff is simply the sum of rewards. While this 
is not well-defined in general, in Shapley's setting it is required that for all positions k, YliPij < 1) 
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with the remaining probabihty mass resulting in termination of play. Thus, no matter which actions 
are chosen by the players, play eventually ends with probability 1, making the payoff well-defined 
except with probability 0. We shall refer to this original variant of the stochastic games model 
as Shapley games. Shapley observed that an alternative formulation of this payoff criterion is to 
require YliPij — 1' t)ut discounting rewards, i.e., penalizing a reward accumulated at time t by a 
factor of 7* where 7 is a discount factor strictly between and 1. Therefore, Shapley games are 
also often referred to as discounted stochastic games. Using the Banach fixed point theorem in 
combination with the von Neumann minimax theorem for matrix games, Shapley showed that all 
Shapley games have a value, or, more precisely, a value vector, one value for each position. Also, the 
values can be guaranteed by both players by a stationary strategy, i.e., a strategy that associates 
a fixed probability distribution on actions to each position and therefore does not take history of 
play into account. 

Gillette games. Gillette [23] requires that for all k,i,j, YliPij — ^■^■■> plays are infinite. 
The total payoff to Player I is liminf'jn^oo(X]t=i ^i)/^ where rj is the reward collected at round 
t. Such games are called undiscounted or limiting average stochastic games. In this paper, for 
coherence of terminology, we shall refer to them as Gillette games. It is much harder to see that 
Gillette games have values than that Shapley games do. In fact, it was open for many years if the 
concrete game The Big Match with only three positions that was suggested by Gillette has a value. 
This problem was resolved by Blackwell and Ferguson [8], and later, Mertens and Neyman [29] 
proved in an ingenious way that all Gillette games have value vectors, using the result of Bewley 
and Kohlberg [i7j. However, the values can in general only be approximated arbitrarily well by 
strategies of the players, not guaranteed exactly, and non-stationary strategies (taking history of 
play into account) are needed even to achieve such approximations. In fact, The Big Match proves 
both of these points. 

Everett games. Of generality between Shapley games and Gillette games is the model of 
recursive games of Everett [21]. We shall refer to these games as Everett games, also to avoid 
confusion with the largely unrelated notion of recursive games of Etessami and Yannakakis [19] . In 
Everett's model, we have a^^- = for all i,j,k, i..e, rewards are not accumulated during play. For 
each particular k, we can have either "^iP^j < 1 or YliPij = 1- In the former case, a prespecified 
payoff b^j is associated to the termination outcome. Payoff is associated with infinite play. The 
special case of Everett games where b^j = 1 for all k,i,j has been studied under the name of 
concurrent reachability games in the computer science literature \17\ \TT\ [25l . Everett showed 
that Shapley games can be seen as a special case of Everett games. Also, it is easy to see Everett 
games as a special case of Gillette games. It was shown in Everett's original paper that all Everett 
games have value vectors. Like Gillette games, the values can in general only be approximated 
arbitrarily well, but unlike Gillette games, stationary strategies are sufficient for guaranteeing such 
approximations. 

For formal definitions and proofs of some of the facts above, see Section [2l 
Our Results 

In this paper we consider the problem of exactly solving Shapley, Everett and Gillette games, i.e., 
computing the value of a given game. The variants of these two problems for the case of perfect 
information (a.k.a. turn-based) games are well-studied by the computer science community, but not 
known to be polynomial time solvable: The tasks of solving perfect information Shapley, Everett 
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and Gillette games and the task of solving Condon's simple stochastic games [13] are polynomial 
time equivalent [1]. Solving simple stochastic games in polynomial time is by now a famous open 
problem. As we consider algorithms for the more general case of imperfect information games, we, 
unsurprisingly, do not come up with polynomial time algorithms. However, we describe algorithms 
for all three classes of games that run in polynomial time when the number of positions is constant 
and our algorithms are the first algorithms with this property. As the values of all three kinds 
of games may be irrational but algebraic numbers, our algorithms output real algebraic numbers 
in isolating interval representation, i.e., as a square- free polynomial with rational coefficients for 
which the value is a root, together with an (isolating) interval with rational endpoints in which this 
root is the only root of the polynomial. To be precise, our main theorem is: 

Theorem. For any constant N , there is a polynomial time algorithm that takes as input 
a Shapley, Everett or Gillette game with N positions and outputs its value vector using isolating 
interval encoding. Also, for the case of a Shapley games, an optimal stationary strategy for the game 
in isolating interval encoding can he computed in polynomial time. Finally, for Shapley as well as 
Everett games, given an additional input parameter e > 0, an e-optimal stationary strategy using 
only (dyadic) rational valued probabilities can be computed in time polynomial in the representation 
of the game and log(l/e). 

We remark that when the number of positions N is constant, what remains to vary is (most 
importantly) the number of actions m for each player in each position and (less importantly) the 
bitsize r of transition probabilities and payoffs. We also remark that Etessami and Yannakakis [20] 
showed that the bitsize of the isolating interval encoding of the value of a discounted stochastic 
game as well as the value of a recursive game may be exponential in the number of positions of the 
game and that Hansen, Koucky and Miltersen [25j showed that the bitsize of an e-optimal strategy 
for a recursive game using binary representation of probabilities may be exponential in the number 
of positions of the game. Thus, merely from the size of the output to be produced, there can be no 
polynomial time algorithm for the tasks considered in the theorem without some restriction on N. 
Nevertheless, the time complexity of our algorithm has a dependence on N which is very bad and 
not matching the size of the output. For the case of Shapley games, the exponent in the polynomial 
time bound is 0{N)^ while for the case of Everett games and Gillette games, the exponent is 
j\lO(N )^ Thus, getting a better dependence on is a very interesting open problem. 

Prior to our work, algorithms for solving stochastic games relied either on generic reductions to 
decision procedures for the first order theory of the reals [20l|T2], or, for the case of Shapley games 
and concurrent reachability games on value or strategy iteration [33\ lllj. For all these algorithms, 
the complexity is at least exponential even when the number of positions is a constant and even 
when only a crude approximation is required [24-] . Nevertheless, as is the case for the algorithms 
based on reductions to decision procedures for the first order theory of the reals, our algorithms 
rely on the theory of semi-algebraic geometry [^, but in a more indirect way as we explain below. 

Our algorithms are based on a simple recursive bisection pattern which is in fact a very natural 
and in retrospect unsurprising approach to solving stochastic games. However, in order to set the 
parameters of the algorithm in a way that makes it correct, we need separation bounds for values 
of stochastic games of given type and parameters; lower bounds on the absolute value of games 
of non-zero value. Such bounds are obtained by bounding the algebraic degree and coefficient 
size of the defining univariate polynomial and applying standard arguments, so the task at hand 
boils down to determining as good bounds on degree and coefficient size as possible; with better 
bounds leading to faster algorithms. To get these bounds, we apply the general machinery of real 
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algebraic geometry and semi-algebraic geometry following closely the techniques of the seminal 
work of Basu, Pollack and Roy 0. That is, for each of the three types of games, we describe how 
for a given game G to derive a formula in the first order theory of the real numbers uniquely defining 
the value of G. This essentially involves formalizing statements proved by Shapley, Everett, and 
Mertens and Neyman together with elementary properties of linear programming. Now, we apply 
the powerful tools of quantifier elimination [HI Theorem 14.16] and sampling [3, Theorem 13.11] 
to show the appropriate bounds on degree and coefficient size. We stress that these procedures 
are only carried out in our proofs; they are not carried out by our algorithms. Indeed, if they 
were, the time complexity of the algorithms would be exponential, even for a constant number 
of positions. While powerful, the semi-algebraic approach has the disadvantage of giving rather 
imprecise bounds. Indeed, as far as we know, all published versions of the quantifier elimination 
theorem and the sampling theorem have unspecified constants ("big-Os"), leading to unspecified 
constants in the code of our algorithms. Only for the case of Shapley games, are we able to do 
somewhat better, their mathematics being so simple that we can avoid the use of the general tools 
of quantifier elimination and sampling and instead base our bounds on solutions to the following 
very natural concrete problem of real algebraic geometry that can be seen as a very special case of 
the sampling problem: 

Given a system of m polynomials in n variables ( where m is in general different from n) of 
degree bounded by d, whose coefficients have bitsizes at most t, and an isolated (in the Euclidean 
topology) real root of the system, what is an upper bound on its algebraic degree as a function of d 
and n? What is a bound on the bitsizes of the coefficients of the defining polynomial? 

Basu, Pollack and Roy [3, Corollary 13.18] stated the upper bound 0{d)'^ on the algebraic 
degree as a corollary of the sampling theorem. We give a constructive bound of {2d +1)" on the 
algebraic degree and we derive an explicit bound on the coefficients of the defining polynomial. We 
emphasize that our techniques for doing this are standard in the context of real algebraic geometry; 
in particular the deformation method and u-resultants are used. However, we find it surprising that 
(to the best of our knowledge) no explicit constant for the big-0 was previously stated for this very 
natural problem. Also, we do not believe that {2d+ 1)" is the final answer and would like to see an 
improvement. We hope that by stating some explicit bound we will stimulate work improving it. 
We note that for the case of isolated complex roots, explicit bounds appeared recently, see Emiris, 
Mourrain and Tsigaridas |18| and references therein. 

The degree bounds for the algebraic problem lead to upper bounds on the algebraic degree 
of the values of Shapley games as a function of the combinatorial parameters of the game. We 
also provide corresponding lower bounds by proving that polynomials that have among their real 
roots the value of certain Shapley games are irreducible. We prove irreducibility based on Hilbert's 
irreducibility theorem and a generalization of the Eisenstein criterion, As these bounds may be of 
independent interest, we state them explicitly: 

The value of any Shapley game with N positions, m actions for each player in each position, and 
rational payoffs and transition probabilities, is an algebraic number of degree at most {2m + 5)^. 
Also, for any N,m > 1 there exists a game with these parameters such that its value is an algebraic 
number of degree m^~^ . 

The lower bound strengthens a result of Etessami and Yannakakis |20] who considered the case 
of m = 2 and proved a 2^^^^ lower bound. For the more general case of Everett games and Gillette 
games, we are only able to get an upper bound on the degree of the form m'^^^^^ and consider 
getting improved bounds for this case an interesting problem (we have no lower bounds better than 
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for the case of Shapley games). As explained above, replacing the big-Os with explicit constants 
requires "big-O-less" versions of the quantifier elimination theorems and sampling theorems of 
semi-algebraic geometry. We acknowledge that it is a straightforward but also probably quite 
work-intensive task to understand exactly which constants are implied by existing proofs. Clearly, 
we would be interested in such results, and are encouraged by recent work of the real algebraic 
geometry community [4] essentially providing a big-O-less version of the very related Theorem 
13.15 of Basu, Pollack and Roy. We do hypothesize that the constants will be much worse that 
the constant of our big-O-less version of Corollary 13.18 of Basu, Pollack and Roy and that merely 
stating some constants would stimulate work improving them. 

As a final byproduct to our techniques, we give a new upper bound on the complexity of the 
strategy iteration algorithm for concurrent reachability games pTj that matches the known lower 
bound [23]. We show: The strategy improvement algorithm of Chatterjee, de Alfaro and Henzinger 
11 If computes an e- optimal strategy in a concurrent reachability game with N actions, m actions for 
each player in each position after at most (ly'g)™-^''^' iterations. Prior to this paper only a doubly 
exponential upper bound on the complexity of strategy iteration was known, even for the case of 
a constant number of positions [24j. The proof uses a known connection between the patience of 
concurrent reachability games and the convergence rate of strategy iteration |24j combined with a 
new bound on the patience proved using a somewhat more clever use of semi-algebraic geometry 
than in the work leading to the previous bound |25j . 

Structure of the paper 

Section [2] contains background material and notation. Section [3] contains a description of our 
algorithms. Section H] contains the upper bounds on degree of values and lower bounds on coefficient 
sizes of defining polynomials and resulting separation bounds of values needed for the algorithm, for 
the case of Shapley, Everett and Gillete games. The proof of the exact bounds, big-O-less version, 
of the algebraic degree and the separation bounds of the isolated real solutions of a polynomial 
system is presented in Section O Section [6] presents the lower bound construction on the Shapley 
games and the algebraic tools needed for this. Finally, Section [7] contains the consequences of our 
results for the strategy improvement algorithm for concurrent reachability are explained. 

2 Preliminaries 

(Parameterized) Matrix Games 

A matrix game is given by a real m x n matrix A of payoffs Oij. When Player I plays action 
i G {1,2, ...,m} and Player II simultaneously plays action j £ {1, 2, . . . , n}. Player I receives 
a payoff Oij from Player II. A strategy of a player is a probability distribution over the player's 
actions, i.e. a stochastic vector. Given strategies x and y for the two players, the expected payoff to 
player I is x'^Ay. We denote by val(A) the maximin value of the game. As is well-known the value 
as well as an optimal mixed strategy for Player I can be found by the following linear program, in 
variables xi, . . . , Xm and v. By /„ we denote the vector of dimension n with all entries being 1. 

max V 

s.t. fnV - A'^x < , . 

X > ^ ' 

f'^r = 1 
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The following easy lemma of Shapley is useful. 

Lemma 1 (|34]. equation (2)). Let A = (aij) and B = (bij) be m x n matrix games. Then 

I va[{A) — val(i?)| < max jajj — bij\ 

In the following we will find it convenient to use terminology of Bertsimas and Tsitsiklis [6] . We 
say that a set of linear constraints are linearly independent if the corresponding coefficient vectors 
are linearly independent. 

Definition 2. Let P be a polyhedron in defined by linear equality and inequality constraints and 
let X G M". 

1. X is a basic solution if all equality constraints of P are satisfied by x, and there are n linearly 
independent constraints of P that are satisfied with equality by x. 

2. x is a basic feasible solution (bfs) if x is a basic solution and furthermore satisfies all the 
constraints of P. 

The polyhedron defined by LP ([T]) is given by 1 equality constraint and n + m inequality 
constraints, in m + 1 variables. Since the polyhedron is bounded, the LP obtains its optimum value 
at a bfs. To each bfs, (x, v), we may thus associate a set of m + 1 linearly independent constraints 
such that turning all these constraints into linear equations yields a linear system where (x, v) is 
the unique solution. Furthermore we may express this solution using Cramer's rule. We order 
the variables as xi, . . . ,Xm,v, and we also order the constraints so that the equality constraint is 
the last one. Let B he a set of m + 1 constraints of the linear program, including the equality 
constraint. We shall call such a set B a potential basis set. Define to be the (m + 1) x (m + 1) 
matrix consisting of the coefficients of the constraints in B. The linear system described above can 
thus be succinctly stated as follows: 



We summarize the discussion above by the following lemma. 
Lemma 3. Let v £R and x e be given. 

1. The pair {x,vY is a basic solution of ^}) if and only if there is a potential basis set B such 
that det(M^) / and {x,vy = {M^)-'^em+i- 

2. The pair (x, f )"*" is a bfs of (OP if and only if there is a potential basis set B such that 
det(M^) / 0, {x,v)'^ = (M^)-iem+i, x > and f^v - AJ x < 0. 

By Cramer's rule we find that Xi = det((M^)i)/ det(M^) and v = det((M^)m+i)/det(M^). 
Here (M^)j is the matrix obtained from by replacing column i with Cm+i- 

We shall be interested in parameterized matrix games. Let j4 be a mapping from to m x n 
matrix games. Given a potential basis set B we will be interested in describing the sets of parameters 
for which B gives rise to a bfs as well as an optimal bfs for LP ([T|). We let denote the set of 
w G such that B defines a bfs for the matrix game A{w), and we let denote the set of 
w E such that B defines an optimal bfs for the matrix game A{w). Let Bi C {1, . . . ,n} be 
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the set of indices of the first n constraints that are not in B. Similarly, let B2 C {1, . . . ,m} be 
the indices of the next m constraints that are not in B. We may describe the set as a union 
F^~^ U . Here F^~^ is defined to be the set of parameters w that satisfy the following m + 1 
inequalities: 

detCM^^")) > , 

m 

det((M^("V+i) -Y^^ij det(((M^("))0) < for j G B^, 



i=l 



dQl{{Mg^'"^)i) > for i e B2. 



The set F^ is defined analogously, by reversing all inequalities above. With these in place we can 
describe as the sets of parameters w G F^ for which 

det((M^("))„+i) =val(^(u;))det(M^("')) . 



Shapley and Everett games 

We will define stochastic games in a general form, following Everett [21], to capture both Shapley 
games as well as Everett games (but not Gillette games) as direct specializations. Everett in fact 
defined his games abstractly in terms of "game elements". We shall restrict ourselves to game 
elements that are given by matrix games (cf. [32]). Because of this, our precise notation will differ 
slightly from the one of Everett. 

For that purpose a stochastic game F is specified as follows. We let denote the number 
of positions^ numbered {1, . . . ,iV}. In every position k, the two players have and actions 
available, numbered {1,... and {1,... If at position k Player I chooses action i and 

Player II simultaneously chooses action j. Player I receives reward a^!^- from player II. After this, 
with probability s^j > the game stops, in which case Player I receives an additional reward b^^ from 

player II. With probability p^j > 0, play continues at position I. We demand s^^ + 'Yld=iP^j — 1 
all positions k and all pairs of actions (i, j). A strategy of a player is an assignment of a probability 
distribution on the actions of each position, for each possible history of the play, a history being 
the sequence of positions visited so far as well as the sequences of actions played by both players 
in those rounds. A strategy is called stationary if it only depends on the current position. 

Given a pair of strategies x and y as well as a starting position k, let rj be the random variable 
denoting the reward given to Player I during round i (if play has ended we define this as 0). We 
define the expected total payoff by t^{x, y) = lim„_j.oo E [X]"^^ "^^^ ' where the expectation is taken 
over actions of the players according to their strategies x and y, as well as the probabilistic choices of 
the game (In the special cases of Shapley and Everett games the limit always exist). We define the 
lower value, , and upper value, v^, of the game F, starting in position khy v'' = sup^, infj^ t^{x, y), 
and v^ = laiy sup^, t^{x, y). In case that = we define this as the value of the game, starting 
at position k. Assuming F has a value, starting at position k, we say that a strategy x is optimal 
for Player I, starting at position k if infy t^[x, y) = , and for a given e > 0, we say the strategy x 
is e-optimal starting at position k, if infj, r'^(x, y) > — e . We define the notions of optimal and 
e-optimal analogously for Player II. 

A Shapley game |34jJ is a special case of the above defined stochastic games, where s^^ > 
and 6^- = for all positions k and all pairs of actions (i,j). Given valuations vi,...,vn for 
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the positions and a given position k we define A^[v) to be the x rifc matrix game where en- 
try ihj) is + Yld=iP^j''^i- The value iteration operator T : — )■ is defined by T{v) = 
(val(^-'^(f )), . . . , val(A^(f ))) . The fohowing theorem of Shapley characterizes the value and opti- 
mal strategies of a Shapley game. 

Theorem 4 (Shapley). The value iteration operator T is a contraction mapping with respect to 
supremum norm. In particular, T has a unique fixed point, and this is the value vector of the 
stochastic game T. Let x* and y* be the stationary strategies for Player I and player II where in 
position k an optimal strategy in the matrix game A''{v*) is played. Then x* and y* are optimal 
strategies for player I and player II, respectively, for play starting in any position. 

An Everett game [.21j is a special case of the above defined stochastic games, where a^^ = for 
all k,i,j. In contrast to Shapley games, we may have that s^j = for some k,i,j. Everett points 
out that his games generalize the class of Shapley games. Indeed, we can convert Shapley game F 
to Everett game T' by letting b^j = a^j/s^j, recalling that s^j > 0. 

Given valuations vi,...,vn for the positions and a given position k we define A^{v) to be 
the ruk X matrix game where entry is s^jb'jlj + YliLiPij'^i- The value mapping operator 

M : is then defined by M{v) = {va\{A^{v)), . . . , val{A^ (v))) . Define relations > and ^ 

on as follows: 

u )^ V if and only if < * i ' « ^ ^ ^ 

\ui>Vi if Vi < 

u ^ V if and only if < * i ' t ^ ^ ^ 

Ui < Vi if Vj > 



Next, we define the regions Ci(T) and C2(r) as follows: 

Ci(T) = {v \ M{v) )^ v}, 

C2(r) = e I Miv) 4 v}. 

A critical vector of the game is a vector v such that v £ C'i(r) H C2(r). That is, for every e > 
there exists vectors vi £ Ci{T) and V2 G C'2(r) such that Ht' — fi||2 < e and ||v — 1^2112 < e- 

The following theorem of Everett characterizes the value of an Everett game and exhibits near- 
optimal strategies. 

Theorem 5 (Everett). There exists a unique critical vector v for the value mapping M , and this 
is the value vector ofT. Furthermore, v is a fixed point of the value mapping, and if vi £ Ci{T) 
and V2 £ C'2(r) then vi < v < V2. Let vi £ Ci{T). Let x be the stationary strategy for player 
I, where in position k an optimal strategy in the matrix game A''{vi) is played. Then for any k, 
starting play in position k, the strategy x guarantees expected payoff at least vi^k for player I. The 
analogous statement holds for V2 £ C2 (T) and Player II. 

Gillette Games 

While the payoffs in Gillette's model of stochastic games cannot be captured as a special case of 
the general formalism above, the general setup is the same, i.e., the parameters A^, m^, n^, a^-,p^- 
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is as above and the game is played as in the case of Shapley games and Everett games. In Gillette's 
model, we have b'^j = and s^J = for all k,i,j. The payoff associated with an infinite play of a 

Gillette game is by definition liminfr-i.oo(l^^i ^«)/^ where rt is the reward collected at round t. 
Upper and lower values are defined analogously to the case of Everett and Shapley games, but with 
the expectation of the payoff defined in this way replacing r'^(x, y). Again, the value of position k is 
said to exist if its upper and lower value coincide. An Everett game can be seen as a special case of a 
Gillette game by replacing each termination outcome with final reward b with an absorbing position 
in which the reward b keeps recurring. The central theorem about Gillette games is the theorem 
of Mertens and Neyman [29], showing that all such games have a value. The proof also yields the 
following connection to Shapley games that is used by our algorithm: For a given Gillette game T, 
let Fx be the Shapley game with all stop probabilities s^j being A and each transition probability 
being the corresponding transition probability of F multiplied by 1 — A. Let be the value of 
position k inV and let be the value of position k 'mT\. Then, the following holds. 

Theorem 6 (Mertens and Neyman). 

v'' = hm Xvx 

Real Algebraic Numbers 

Let p{x) G Z[x] be a nonzero polynomial with integer coefficients of degree d. Write p{x) = 
Yli=i o-iX^j with ad / 0. The content cont(p) of p is defined by cont(p) = gcd(ao, . . . , ad), and we 
say that p is primitive if cont(p) = 1. We can view the coefficients of p as a vector a G R"'"''-'^. We 
then define the length \p\ of p by \p\ = \\a\\2 as well as the height \p\oo of p by \p\oo = ||«||oo- 

An algebraic number a S C is a root of a polynomial in Q[x]. The minimal polynomial of a is 
the unique monic polynomial in g G Q[x] of least degree with q{a) = 0. Given an algebraic number 
a with minimal polynomial q, there is a minimal integer k > 1 such that p = kg £ Z[x]. In other 
words p is the unique polynomial in Z[x] of least degree with p{a) = 0, cont(p) = 1 and positive 
leading coefficient. We extend the definitions of degree and height to a from p. The degree deg(a) 
of a is defined by deg(a) = deg(p) and height |a|oo of a is defined by |a|oo = |p|oo- 

Theorem 7 (Kannan, Lenstra and Lovasz). There is an algorithm that computes the minimal 
polynomial of a given algebraic number a of degree uq when given as input d and H such that 
deg(o!) < d and \a\oo < H and a such that \a — a\ < 2^^ /{12d), where 

s = \d^/2 + {3d + 4) log2((i + 1) + 2dlog2{H)] . 

The algorithm runs in time polynomial in nQ,d and logH. 

3 Algorithms 

In this section we describe our algorithms for solving Shapley, Everett and Gillette games. The 
algorithms for Shapley and Everett games proceed along the same lines, using the fact that Shapley 
games can be seen as a special case of Everett games explained above. The algorithm for Gillette 
games is a reduction to the case of Shapley games using Theorem[6l We proceed by first constructing 
the algorithms for Everett and Shapley games and explain the algorithm for Gillette games at the 
end of this section. 
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Reduced games 

Let r be an Everett game with + 1 positions. Denote by ^(r) the critical vector of T. Given a 
valuation v for position + 1 we consider the reduced game r^(f) with positions, obtained from 
r in such a way that whenever the game would move to position + 1, instead the game would 
stop and player 1 would receive a payoff v. 

Denote by V^{v) the critical vector of the game r^(f). We have the following basic lemma 
shown by Everett. 

Lemma 8. For every 6 > 0, for all v and for all positions k: {V'^'{v))k — S < {V'^{v — 5))k < 

< {V^{v + 6))k < {V'^{v))k + S. In particular, V'^{v) is a continuous monotone function of v in all 

components. The first and last inequalities are strict inequalities, unless {V^{v))k = v. 

Let V{v) denote the value Yal{A'^^^{V'^{v),v)) of the parameterized game for position N + 1, 
where the first positions are given valuations according to V'''{v) and position A^ + 1 is given 
valuation v. 

Lemma 9. Denote by v* component A^ + 1 ofV{T). Then the following equivalences hold. 

1. Suppose t>* > and v > 0. Then, V{v) > v 4^ v < v* . 

2. Suppose V* < and v <0. Then, V{v) < v 4^ v* < v. 

Proof. We shall prove only the first equivalence. The proof of the second equivalence is analo- 
gous. Assume first that V{v) > v. Since V is continuous we can find z £ Ci{T^{v)) such that 
val{A'^^^[z,v)) > V as well. This implies that {z,v) G Ci{T) and by definition of Ci{T) we obtain 
tjiat V < V*. By Theorem [5l V{v*) = Yal{A^+'^{V{v*),v*)) = va\{A^+'^{V{T))) = v* . Since 
V{v) > V we have v < v*. 

The other part of the equivalence was shown by Everett as a part of his proof of Theorem [5l We 
present the argument for completeness. Everett in fact shows that v* is the fixpoint of V of minimum 
absolute value. That is, V{v*) = v* and whenever V{v) = v we have \v\ > \v*\. Now assume that 
V < V*, and let 6 = v* — v. From Lemma[8]we have V{v) = V{v* — 6) > V{v*) — 6 = v* — 6 = v. 
Since v >0, from minimality of \v*\ we have the strict inequality V{v) > v. □ 

Recursive bisection algorithm 

Based on Lemma [9] we may construct an idealized bisection algorithm Bisect (Algorithm [T|) for 
approximating the last component of the critical vector, unrealistically assuming we can compute 
the critical vector of a reduced game exactly. For convenience and without loss of generality, we 
will assume throughout that the payoffs in the game F we consider have been normalized to belong 
to the interval [—1,1]. The correctness of the algorithm follows directly from Lemma [9l Given that 
we have obtained a sufficiently good approximation for the last component of the critical vector 
we may reconstruct the exact value using Theorem [71 What "sufficiently good" means depends on 
the algebraic degree and size of coefficients of the defining polynomial of the algebraic number to 
be given as output, so we shall need bounds on these quantities for the game at hand. 

To get an algorithm implementable as a Turing machine we will have to compute with approx- 
imations throughout the algorithm but do so in a way that simulates Algorithm [1] exactly, i.e., so 
that the same branches are followed in the if-statements of the algorithm. For this, we need sepa- 
ration bounds for values of stochastic games. Fortunately, these follow from the bounds on degree 
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Algorithm 1: Bisect(r,/c) 



Input: Game F with N + 1 positions, all payoffs between -1 and 1, accuracy parameter 
k>2. 

Output: V such that \v — v*\ < . 
if V{Q) = then 
I return 
else 

ti; ^ 

Vr <r- sgn(y(0)) 
for i 1 to A: — 1 do 

V ^Jvi + Vr)/2 

if \V{v)\ > \v\ then 

I Vl <— V 
else 

^ Vr V 

return {vi +Vr)/2 



Algorithm 2: ApproxBisect(r, k) 



Input: Game F with N + 1 positions, m actions per player in each position, all payoffs 

rationals between -1 and 1 and of bitsize L, accuracy parameter k >2. 
Output: V such that 1^; — v*\ < 2~^ . 

e ^ sep{N,m,L,0)/5 

V ^ val(^^+i([ApproxVal(F^(0), r-logel)]r_iogel,0)) 
if \v\ < 2e then 
I return 
else 

Vr ^ sgn(f ) 
for i ^ 1 to /c — 1 do 

V ^ {Vl+ Vr)/2 

e ^ sep(iV, m,max(L,i),i)/5 

v' ^ val(A^+i([ApproxVal(y-(i;), \-loge])]^_,,^,-^,v)) 
if I?;'! > \v\ then 

I Vl <— V 

else 

^ Vr V 

return {vi + Vr)/2 
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Algorithm 3: ApproxVal(r, A;) 



Input: Game F with positions, payoffs between -1 and 1, accuracy parameter k >2. 
Output: Value vector v such that \vi — v*\ < 2^^ for all positions i. 
if iV = then 
I return The empty vector 
else 

for i ^ 1 to do 

1^ f j = ApproxBisect(r, k), where position i is swapped with position 
Return v 



and coefficient size needed anyway to apply Theorem [71 Consider a class C of Everett games (In fact 
C will be either all Everett games or the subset consisting of Shapley games). Let se^{N,m^L,j) 
denote a positive real number so that if v is the value of game F G C with N positions, m actions 
to each player in every position, and every rational occurring in the description in the game having 
bitsize at most L, and v is not an integer multiple of then v differs by at least sep{N,m, L, j) 
from every integer multiple of 2~^ . Also, we let [v]k denote the function that rounds all entries 
in the vector v to the nearest integer multiple of 2~^. Our modified algorithm ApproxBisect (for 
approximate Bisect) is given as Algorithm [2j The procedure ApproxVal invoked in the code simply 
computes approximations to the values of all positions in a game using ApproxBisect. 

The correctness of ApproxBisect follows from the correctness of Bisect by observing that the 
former emulates the latter, in the sense that the same branches are followed in the if-statements. 
For the latter fact, Lemma [T] and Lemma [9] are used. 

The complexity of the algorithm is estimated by the inequalities 

7ApproxVal(A^, m, L, k) < iVTApproxBisect (A^, L, k), 

and 

7ApproxBisect(A, m,L,k) < \- log e] (rLp(m + 1, [- log e] ) 

+ 7ApproxVai(A^ " 1, m, max{L, k}, \- log e] ), 

where e = sep(A^ — l,m,max{L, fc}, /c)/5, and Ti^p{m + l,k) is a bound on the complexity of 
computing the value of a m x m matrix game with entries of bitsize k. 

Plugging in the separation bound for Shapley games of Proposition [T2| we get a concrete 
algorithm without unspecified constants. Also, to get an algorithm that outputs the exact algebraic 
answer in isolating interval encoding we need to call the algorithm with parameter k appropriately 
chosen to match the quantities stated in Theorem [3 taking into account the degree and coefficient 
bounds given in Proposition [T2l Finally, plugging in a polynomial bound for Tlp, the above 
recurrences is now seen to yield a polynomial time bound for constant N. However, the exponent 
in this polynomial bound is 0{N)^ , i.e., the complexity is doubly exponential in A^. We emphasize 
that the fact that the exact value is reconstructed in the end only negligibly changes the complexity 
of the algorithm compared to letting the algorithm return a crude approximation. Indeed, an 
approximation algorithm following our approach would have to compute with a precision in its 
recursive calls similar to the precision necessary for reconstruction. Only for games with only one 
position (and hence no recursive calls) would an approximation version of ApproxBisect be faster. 
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For the case of Everett games, the degree, coefficient and separation bounds of Proposition [T6l 
similarly yields the existence of a polynomial time algorithm for the case of constant N, with an 
exponent of N'^^^ \ 

Computing strategies 

We now consider the task of computing e-optimal strategies to complement our algorithm for 
computing values. For Shapley games the situation is simple. By Theorem^ once we have obtained 
the value v* of the game, we can obtain exactly optimal stationary strategies x* and y* by finding 
optimal strategies in the matrix games A^{v*). Also, if we only have an approximation v to v*, 
such that \\v* —v\\oo < e, consider the stationary strategies x* and y* given by optimal strategies in 
the matrix games A^{v). In every round of play, these strategies may obtain e less than the optimal 
strategies. But this deficit is discounted in every round by a factor 1 — A where A = min(s^j) > 
is the minimum stop probability. Hence x and y are in fact (e/A)-optimal strategies. 

For Everett games the situation is more complicated, since the actual values v* may in fact 
give absolutely no information about e-optimal strategies. We shall instead follow the approach 
of Everett and show how to find points vi £ Ci and V2 G C2 that are e-close to v* . Then, using 
Theorem [5] we can compute e-optimal strategies by finding optimal strategies in the matrix games 
A''{vi) and ^^(^2), respectively. 

Let r be an Everett game with + 1 positions. We first describe how to exactly compute 
vi £ Ci, given the ability to exactly compute the values; the case of V2 G C2 is analogous. Let v* 
be the critical vector of T. In case that v* < for all i, then by definition of Ci we have v* £ Ci. 
Otherwise at least one entry of v* is positive, so assume v"^^^ > 0. As in Section [3] we consider the 
reduced game T'^{v), taking payoff v for position A^ -|- 1. By Lemma [U whenever < f < V^^-^ we 
have V{v) > v. Suppose in fact that we pick v so that v'^_^_-^ —v< e/2. Now let 5 = V{v) — v. Recall 
V{v) = va\{A^~^^{V^ {v),v)). Now recursively compute z G Ci(r'^{v)) such that liy(t') — 2;||oo < 
min(5/2,e). Then by Lemmadwe have that \val{A^+^{V{v),v)) - val{A'^+^{z,v))\ < 6/2, which 
means val{A'^^^{z,v)) > v. This means that vi = {z,v) G Ci, and by our choices we have 
11^1 — 'y*||oo < as desired. We now have an exact representation of an algebraic vector vi in Ci, e- 
approximating the critical vector. The size of the representation in isolating interval representation 
is polynomial in the bitsize of F (for constant A^). From this we may compute the optimal strategies 
of A''{vi) which also form an e-optimal strategy of F. The polynomial size bound on vi implies that 
all non-zero entries in this strategy have magnitude at least 2~' where / is polynomially bounded 
in the bitsize of F. We now show how to get a rational valued 2e-optimal strategy in polynomial 
time. For this, we apply a rounding scheme described in Lemmas 14 and 15 of Hansen, Koucky 
and Miltersen |25j . For each position, we now round all probabilities, except the largest, upwards 
to L significant digits where L is a somewhat larger polynomial bound than /, while the largest 
probability at each position is rounded downwards to L significant digits. Using Lemma 14 (see also 
the proof of Lemma 15) of Hansen, Koucky and Miltersen [25j, we can set L so that the resulting 
strategy is 2e-optimal in F. This concludes the description of the procedure. 

The case of Gillette games 

To compute the value of a given Gillette game, we proceed as follows. Based on Theorem [6] we can 
similarly to the case of Shapley games and Everett games give degree, coefficient, and separation 
bounds for the values of the given game. These are given in Proposition 1201 Furthermore, and also 
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based on Theorem [6l we can for a given e give an explicit upper bound on the value of A necessary 
for to approximate within e. This expression for such A, given in Proposition [22| is of the 

form Ag = e^™" . Our algorithm proceeds simply by setting e so small that an e-approximation 
to the value allows an exact reconstruction of the value using Theorem [71 Such e can be computed 
as we have derived degree and coefficient bounds for the value of the Gillette game at hand. We 
then run our previously constructed algorithm on the Shapley game T\, where A = A^. 

4 Degree and separation bounds for Stochastic Games 
4.1 Shapley Games 

Our bounds on degree, coefficient size, and separation for Shapley games are obtained by a reduction 
to the same question about isolated solutions of polynomial systems. The latter is treated in Section 
[5j In this section we present the reduction as well as stating the consequences obtained from this 
and Theorem 1231 of Section [5l To analyse our reduction we also need the following simple fact. 

Proposition 10 ([3J, Proposition 8.12). Let M be an m x m matrix, whose entries are integer 
polynomials in variables xi, . . . ,Xn of degree at most d and coefficients of bitsize at most r. Then 
det(M), as a polynomial in variables is of degree at most dm and has coefficients of 

bitsize at most (r + bit(m))?n, + nhit{md + 1), where bit{z) = [Igz] . 

Also, denote by B{v, e) the ball around v G of radius e > 0, {v' G | \\v — v'\\2 < e}. We 
are now in position to present the reduction. 

Theorem 11. Let T be a Shapley game, with N positions. Assume that in position k, the two 
players have ru}^ and n^. actions available. Assume further that all payoffs and probabilities in T 
are rational numbers with numerators and denominators of bitsize at most r. 

Then there is a system S of polynomials in variables vi, . . . , vm, for which the value vector v* 
of r is an isolated root. Furthermore the system S consists of at most X^fcLi polynomials, 
each of degree at most m + 2 and having integer coefficients of bitsize at most 2(A^ + l)(m + l)^r + 1, 
where m = max^-^ (min(nfc, mj,)). 

Proof. Let v* G be the fixpoint of T given by Theorem^ For all positions k, and for all potential 
basis sets corresponding to the parameterized matrix game A^ we consider the closures of 
the sets O^^.. Since there are finitely many positions and for each position finitely many potential 

basis sets, we may find e > such that whenever B(t;*,e) n 0^1 / we have v* £ O^l for all 
positions k and all potential basis sets B''. For a given position k, let be the set of such potential 
basis sets. Then, for every B^ G B'^ define the polynomial 

P^.H =det((M^^"')U+i)-u;,det(M;J^'")) . 

Let V be the system of polynomials consisting of all such polynomials for all positions k. We 
claim that v* is an isolated root of the system V. First we show that v* is in fact a solution. 

Consider any position k and any polynomial P^k G V. By construction we have v* G O^t, and we 

may thus find a sequence (tt;*)^^ in O^l converging to v*. Since for every i, G 0^1 we have 

that det((M^ '')m+i) — val(y4^(it;*)) det(M^^"' ^) = 0, and thus by continuity of the functions 
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det, val, and the entries of , we obtain det((M^*^''* V+i) - val(^'=(t;*)) det(M^*^''*^) = 0. But 
Yal{A^{v*)) = vl and hence PBk{v*) = 0. 

Next we show that v* is unique. Indeed, suppose that v' S B(t;*,e) is a solution to the system 
v. For each position k pick a potential basis set such that describes an optimal bfs for 
A'^{v'). Now since v' G B(t>*, e) as well as v' G O^^. we have by definition that B^ G and hence 
Pgfc G P. As a consequence v' must be a root of Pb*:. Now, since B^ in particular is a basic 
solution we have det(M^j. ) 7^ 0. Combining these two facts we obtain 

and since B^ is an optimal bfs for A^{v') we have that \8l{A'^{v'))k = v'^. Since this holds for all 
k, we obtain that v' is a fixpoint of T, and Theorem [J] then gives that v' = v* . 

To get the system 5 we take (smallest) integer multiples of the polynomials in S such that all 
polynomials have integer coefficients. For a given position A;, we have ("'°^^'') potential basis sets, 
giving the bound on the number of polynomials. Assume now that < rik (In case > Uk we 
can consider the dual of the linear program in Lemma [3]). Fix a potential basis set B^ . 

Using Proposition [TOl the degree of P^k{w) is at most 1 + (m^. + 1). Further to bound the 
bitsize of the coefficients, note that using linearity of the determinant we may multiply each row of 
the matrices (M^j. and M^^^"'^ by the product of the denominators of all the coefficients 

of entries in the same row in the matrix M^,. . This product is an integer of bitsize at most 
(N + l)(mfc + 1)t. Hence, doing this, both matrices will have entries where all the coefficients are 
integers of bitsize at most (A^ + l)(mA; + 1)''" as well. Now by Proposition 1101 again the bitsize of 
the coefficients of both determinants is at most 

((A^ + l)(mfc + 1)t + bit(mfc))(mfc + 1) + A^ bit(mfc + 2) < 

From this the claimed bound follow. □ 

We can now state the degree and separation bounds for Shapley games. 

Proposition 12. Let T be a Shapley game with N positions and m actions for each player in each 
position and all rewards and transition probabilities being rational numbers with numerators and 
denominators of bitsize at most t, and let v be the value vector ofV. Then, each entry of v is an 
algebraic number of degree at most (2m + 5)'^ and the defining polynomial has coefficients of bitsize 
at most 2\m^ N'^T{2m + b)^^^ . Finally, if an entry of v is not an integer multiple of2^^, it differs 
from any such multiple by at least 2-22m2jvV(2m+5)^-i-j(2m+5)'^-i_ 

Proof. From Theorem [11] the value of F is among the isolated real solutions of a system of 
E^i (^r) ^ polynomials, of de gree at most m + 2 and bitsize at most 2(A^ + l)(m + 1)^t + 1 < 
ANvn?T. Theorem [23] implies that the algebraic degree of the solutions is (2(m+2)+l)^ = (2m+5)^ 
and the defining polynomial has coefficients of magnitude at most 

2(8m2Ar2r+8Afm+5Af lg(m))(2m+5)'^-i ^ 221"^^^^T(2m+5)'^-i 

For a position k, let the defining polynomial be A{vk)- To compute a lower bound on the 
difference between a root of A and a number 2~^ , it suffices to apply the map ^ + 2~^ to A 
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and compute a lower bound for the roots of the shifted polynomial. The shifted polynomial also has 
degree (2m + 5)^, but its maximum coefficient bitsize is now bounded by 2lm^N'^T{2m + 5)^"^ + 
j{2m + 5)^ + 41g(2m + 5)^ < 22m'^N'^T{2m + 5)^-^ +j{2m + 5)^. By applying Lemma [26] we 
get the result. □ 



4.2 Everett Games 

Our bounds on degree, coefficient size, and separation for Everett games are obtained by a reduction 
to the more general results about the first-order theory of the reals. 

Theorem 13. Let T be an Everett game, with N positions. Assume that in position k, the two 
players have ruk and rik actions available. Assume further that all payoffs and probabilities in T 
are rational numbers with numerators and denominators of bitsize at most r. 

Then there is a quantified formula with N free variables that describes whether a vector v* is 
the value vector ofT. The formula has two blocks of quantifiers, where the first block consists of 
a single variable and the second block consists of 2N variables. Furthermore the formula uses at 
most {2N + 3) + 2(m + 2) "^^^i different polynomials, each of degree at most m + 2 and 

having coefficients of bitsize at most 2{N + l)(m + 2)^ bit(m)r, where m = max^-,^ (min(nfc, rrifc)). 

Proof. By Theorem [5] we may express the value vector v* by the following first-order formula with 
free variables v. (Ve)(3i>i, t>2) (e < 0) V {\\v - < e A - V2\\'^ < eAvi £ Ci{T) Av2 G C2(T)) . 
Here the expressions vi S Ci(T) and V2 G C2(X) are shorthands for the quantifier free formulas 
of polynomial inequalities implied by the definitions of Ci{T) and C2{T). We provide the details 
below for the case of Ci(T). The case of C2(r) is analogous. By definition vi € Ci{T) means 
M(fi) ^ vi, which in turn is equivalent to A^;^((val(A^(vi)) > vikAvik > 0) V (yal{A''{vi)) > vik 
A {vik < 0))). Now we can rewrite the predicate ys1{A^{vi)) > vik to the following expression: 
Vb.((«i e F^:+ Adet((M^^^^))^,+i) > r;ifcdet(M^^^^)))) V((r;i G F^,'- A det((M^^^^))^,+i) < 
Vik det(M^j. ))), where the disjunction is over all potential basis sets, and each of the expressions 

vi £ F^^. ^ and vi G -P^^fe ~ are shorthands for the conjunction of the ruk + 1 polynomial inequalities 
describing the corresponding sets. 

By a similar analysis as in the proof of Theorem [11] we get the following bounds, assuming 
without loss of generality that < n^: The predicates vi S F^^.~^ and vi G F^k~ can be written 
as a quantifier free formulas using at most + 1 different polynomials, each of degree at most 
ruk + 2 and having coefficients of bitsize at most 2(A^ + l){mk + 2)^ bit(mfc)r. Also, the predicate 
val(yl^(i'i)) > vik can be written as a quantifier free formula using at most (mfc + 2)("*+'"'^) 
different polynomials, each of degree at most nik + 2 and having coefficients of bitsize at most 
2(iV + l)(mfc + 2)2 bit(mfc)T. 

Combining these further, for all positions we have the following statement (that shall be used 
also in our upper bound for strategy iteration for concurrent reachability games in Section [7|). 

Lemma 14. The predicate vi S Ci{T) can be written as a quantifier free formula using at most 
1 + + ("'^J^^'") different polynomials, each of degree at most m + 2 and having coefficients 
of bitsize at most 2{N + l)(m + 2)^ bit(m)r, where m = max^-,^ (min(nfc, nik)). 

From this the statement of the theorem easily follows. □ 
We shall also need the following basic statement about univariate representations. 
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Lemma 15. Let a he a root of f £ '^[x], which is of degree d and maximum coefficient hitsize at 
most T. Moreover, let g{x) = p{x)/q{x) where p,q G 7j[x] are of degree at most d, have maximum 
coefficient hitsize at most t, and q{a) / 0. Then the minimal polynomial of g{a) is a univariate 
polynomial of degree at most 2d and maximum coefficient hitsize at most 2dr + 7dlgd. 

Proof. The minimal polynomial oi g{a) is among the square-free factors of the following (univariate) 
resultant with respect to y: 



The degree of r is bounded by d and its maximum coefficient bitsize is at most 2dT + 5dlgd [3l 
Proposition 8.46]. Any factor of r has maximum coefficient bitsize at most 2dT + 7dlgd, due to the 



We can now apply the machinery of semi-algebraic geometry to get the desired bounds on degree 
and the separation bounds. 

Proposition 16. Let T he an Everett game with N positions, m actions for each player in each 
position, and rewards and transition prohabilities heing rational numhers with numerators and de- 
nominators of hitsize at most t, and let v he the value vector of T. Then, each entry of v is an 
algebraic numher of degree at most m^^^^^ and the defining polynomial has coefficients of hitsize at 
most Tm^^^ \ Finally, if an entry of v is not a multiple of 2~^ , it differs from any such multiple 
hy at least 2^ max{rj}mO(^^') _ 

Proof. We use Theorem 14.16 (Quantifier Elimination) of Basu, Pollack and Roy [3] on the formula 
of Theorem [13] to find a quantifier free formula expressing that v is the value vector of the game. 
Next, we use Theorem 13.11 (Sampling) of [3] to this quantifier free formula to find a univariate 
representation of the value vector v satisfying the formula from Lemma [T3l That is, we obtain 
polynomials f, go, ■ ■ ■ , g2N, with / and go coprime, such that v = {gi{t)/go{t), ■ ■ ■ , g2N{t)/go{t)), 
where t is a root of /. These polynomials are of degree m^^^ ^ and their coefficients have bitsize 
TmP^^^\ We apply Lemma 1151 to the univariate representation to obtain the desired defining 
polynomials. Finally, we obtain the separation bound using Lemma [26] in the same way as in the 
proof of Proposition [12] □ 

4.3 Gillette's Stochastic Games 

Our bounds on degree, coefficient size, and separation for Gillette games are obtained, as in the 
case of Everett games but in a more involved way, by a reduction to the more general results about 
the first-order theory of the reals. 

Theorem 17. Let T he a Gillette game, with N positions. Assume that in position k, the two 
players have and actions available. Assume further that all payoffs and prohabilities in T 
are rational numbers with numerators and denominators of bitsize at most r. 

Then there is a quantified formula with N free variables that describes whether a vector v* is the 
value vector ofT. The formula has four blocks of quantifiers, where the first three blocks consists 
of a single variable and the fourth block consists of N variables. Furthermore the formula uses at 
most 4 -|- 2{m + 2) Ylk=i ("'°m™'°) different polynomials, each of degree at most 2{m + 2) and having 
coefficients of bitsize at most 2{N + l)(m -1- 2)^^ bit(m)T, where m = max^-^ (min(nfc, m^)). 



r{x) = Tesy{f{y), q{y)x - p{y)) G Z[x\. 



Landau-Mignotte bound, see, e.g., Mignotte 




□ 
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Proof. By Theorem [6] we may express the value vector v* by the following first-order formula with 
free variables v. 

(Ve > 0)(3A, > 0)(VA,0 < A < Xe){3v'){v' = Aval(rA) A \\v' - vf < e) . 

Here Tx is the (1 — A)-discounted version of T, and the expression v' = X va^F^) is a shorthand for a 
quantifier free formula of polynomial equalities and inequalities expressing that v' is the normalized 
vector of values of Tx, and may be expressed as 

/\[\/{{v'e F^"- V v' G F^r) A det((M^,'('^'))^,+i) = A^;^ det(M^J 

using Theorem S] and where is the parameterized matrix game corresponding to Tx obtained as 
explained in Section [2l Here, as in the last section, the disjunction is over all potential basis sets, 

and each of the expressions v' G F^^ and v' G F^^ are shorthands for the conjunction of the 
rrik + 1 polynomial inequalities describing the corresponding sets. 

We next analyze the bounds in the following. By a similar analysis as in the proof of Theorem [T3l 
and Theorem 111! we get the following bounds, assuming without loss of generality that rrij, < n^. 

Lemma 18. The predicates v' G F^^ and v' G F^^ can be written as a quantifier free formulas 
using at most + 1 different polynomials, each of degree at most 2(^m^. + 2) and having coefficients 
of bitsize at most 2{N + l)(mfe + 2)^ bit(myt)r. 

The larger degree compared to the case of Everett games is due to the additional variable A. 
The same is true for the remaining predicate, hence we obtain the following. 

Lemma 19. The predicate v' = Ava^F^) can be written as a quantifier free formula using at most 
+ 2) C^''^^'') different polynomials, each of degree at most 2(m + 2) and having coefficients 
of bitsize at most 2{N + l)(m + 2)^ bit(m)T, where m = max^-,^ (min(nfc, m^)). 

Prom this the statement easily follows. □ 

Proceeding exactly as in the proof of Proposition [TBI we may now prove the following proposi- 
tion, giving the exact same statement for Gillette games as for Everett games. Note, however, that 
since more blocks of quantifiers have to be eliminated, the constants in the big-O's are likely worse. 

Proposition 20. Let T be a Gillette game with N positions, m actions for each player in each 
position, and payoffs and transition probabilities being rational numbers with numerators and de- 
nominators of bitsize at most t, and let v be the value vector of T . Then, each entry of v is an 
algebraic number of degree at most nrfi^^ \ and the defining polynomial has coefficients of bitsize 
at most TnrP^^^\ Finally, if an entry of v is not a multiple of2~^, it differs from any such multiple 
by at least 2^ n^ax{r,i}mO(^') _ 

Next we will obtain a bound on the discount factor for guaranteeing a sufficient approximation 
of the undiscounted game by the discounted one. We will consider the same formula, strip away the 
first two quantifiers, replacing the variable e by a fixed constant and letting A^ be a free variable. 
Next binding the previous free variables v and expressing that these take the values of the value 
vector of F we in effect obtain a first order formula expressing a sufficient condition for whether 
a given discount factor 7 = 1 — A ensures that the values vectors of F and Fa are e-close in every 
coordinate. 
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Theorem 21. Let T be a Gillette game, with N positions. Assume that in position k, the two 
players have mt and Uk actions available. Assume further that all payoffs and probabilities in T 
are rational numbers with numerators and denominators of bitsize at most r. 

Let e = . Then there is a quantified formula with one free variable that gives a sufficient 
condition for whether a given discount factor ^ = 1 — guarantees that ||val(r) — val(r;^J|p < e. 

The formula has five blocks of quantifiers, where the first block consists of N variables, sec- 
ond of 1 variable, third and fourth of 2 variables and the fifth of 2N variables. Furthermore the 
formula uses at most 6 + 4(m + 2) ("''m™'') different polynomials, each of degree at most 

2{m + 2) and having coefficients of bitsize at most max{j, 2(A^ + l)(m + 2)^bit(m)T}, where 
m = max^]^ (min(nfc, mfc)). 

Proof. Following the proof of Theorem [T7] above, we may express the condition by the following 
first-order formula with free variable A^. 

(3u)(VA,0 < A < K){3v'){v' = Aval(rA) A \\v' - v\\^ <eAv = val(r)) , 

and letting v = val(r) be a shorthand for the entire formula guaranteed by Theorem [T71 We obtain 
the formula as claimed by converting the above formula into prenex normal form. The rest of the 
analysis follows closely the proof of Theorem 1171 and is hence omitted. □ 

We can now apply again the machinery of semi-algebraic geometry to get a bound on Ae above 
as a function of e. 

Proposition 22. Let T be a Gillette game with N positions, m actions for each player in each 
position, and payoffs and transition probabilities being rational numbers with numerators and de- 

nominators of bitsize at most t, and let e = . Then there exists A^ = e"^"* , such that 
||val(r)-A,val(rAjf <e. 

Proof. First we use Theorem 14.16 (Quantifier Elimination) of Basu et al.[3| to the formula of 
Theorem [21] to obtain an equivalent quantifier free formula. The (univariate) polynomials in this 
formula are of degree nf^^ ^ and has coefficients of bitsize max{r, jjm'^^^ ^ = log(l/e)Tm*^*^^ 
We can then again use Theorem 13.11 (Sampling) of Lemma [T^ and Lemma [25] to obtain the 
lower bound A^ = g'^'^°'^ ■' . □ 



5 Degree and separation bounds for isolated real solutions 

In this section we prove general results about the coordinates of isolated solutions of polynomial 
systems. The result as stated below provides concrete bounds on the algebraic degree, coefficient 
size and separation. 

Theorem 23. Consider a polynomial system of equations 

(S) gi(xi, . . . ,x„) = • • • = 5rn(a:i, . . . ,x„) = , (2) 

with polynomials of degree at most d and integer coefficients of magnitude at most 2"^. 

Then, the coordinates of any isolated (in Euclidean topology) real solutions of the system are 
real algebraic numbers of degree at most {2d+ 1)", and their defining polynomials have coefficients 
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of magnitude at most 22"('^+4"-ig('^™))(2rf+i)" ^ Also, if = ilj,n) is an isolated solution 

of (S), then for any i, either 

2-2n(r+2nlg(dm))(2d+ir-l ^ l^.^^l 7,, i = . (3) 

Moreover, given coordinates of isolated solutions of two such systems, if they are not identical, they 
differ by at least 

sep(i;) > 2"^"(^+2"'s('^™))(2'^+^)'"~'~5ig(") . (4) 
Before the proof of the theorem we will need to establish some preliminary results. 

5.1 Isolated solutions, minimizers and the w-resultant 

We will use ideas from [26] used for for global minimization of polynomial functions in order to 
reach an appropriate system to analyze. The solutions of the system (S), which consists of real 
polynomials of total degree at most d, are the minimizers of the polynomial 

G{xi, . . . ,X„) = gi{xi, . . . ,Xnf H \-gm{xi, . . .,Xnf (5) 

in M". Furthermore, if z is an isolated real solution of (S), then z is an isolated minimizer for ([5]). 
Let Gi{x.) = The critical points of G(x) are among the solution set of the system 

Gi(x) = --- = G„(x)=0. (6) 

If the number solutions of the system above is finite, then we can use the multivariate resultant^ 
[151 [9] to compute them. We homogenize the polynomials using a new variable xq and introduce 
the linear form Go = uqXq + uixi + • • • + UnXn- We then compute the multivariate resultant of 
Gi, . . . , Gn and Go with respect to the variables xo, xi, . . . , x„, and a homogeneous polynomial with 
degree equal to the product of the degrees of Gj is obtained. This is called the tt-resultant [38j, see 
also l9j. If the number of solutions is finite then the resultant is non- vanishing for almost all linear 
forms Go, and if we factorize it to linear forms over the complex numbers then we can recover the 
solutions of the system. 

To compute, or as in our case to bound, the ^-th coordinates of the solution set, we may assume 
ti^ = — 1 and Ui = 0, for all i different from and L Then the u-resultant is a univariate polynomial 
in uq, and its solutions correspond to the ^-th coordinates of the solutions of the system. 

However, the multivariate resultant vanishes identically if the system has an infinite number of 
solutions. This is the case when the variety has positive dimension or, simply, the variety has a 
component of positive dimension at infinity, also known as excess component. 



5.2 Grobner bases and Deformations 

First we recall the following fundamental results from the theory of Grobner bases. Let /c be a field 
and R = k[xi, . . . , x„]. For an extension field K D k and an ideal I C Rwe let Vk{I) '■= {x £ \ 
fix) = 0, V/ G /}. 

^Following closely [9|, for n homogeneous polynomials fi, . . . , f„ in n variables xi, . . . , Xn, of degrees di, . . . ,d„ 
respectively, the multivariate resultant is a single polynomial in the coefficient of fi, the vanishing of which is the 
necessary and sufficient condition for the polynomials to have a common non-trivial solution in the algebraic closure 
of the field of their coefficients. The resultant is of degree did2 ■ • ■ di-irfi+i ■ ■ ■ dn in the coefficients of fi. 
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Lemma 24. Consider an ideal I C R, such that d := dinijt R/I < oo. 
(i) If {zi, . . . , Zn) G Vk{I)- Then Zj £ K is algebraic over k of degree at most d. 
(a) Suppose that / = (/i, . . . , fn) with 

/i(x) =xfi +/ii(x) 

/„(x) = x^" +/i„(x), 
where deg{hj) < dj. Then dim^ R/I = di - ■ ■ dn- 

Here item (i) follows from the proof of Theorem 6, Chapter 5 of [14] (more precisely, the proof of 
(v) =^ (i)). Item (ii) follows from Proposition 4, also from Chapter 5 of [H], noting that (/i, . . . , /„) 
is a Grobner basis with respect to the graded lexicographic order. 

Next, in order to apply the n-resultant as described above, we will symbolically perturb the 
system. We need to do it in such a way that the perturbed system becomes 0-dimensional and also 
that from the solutions of this perturbed system we can recover the isolated real solutions of the 
original system. In |26] the deformation 

G,(x) = G(x) + A(xf+^) + ... + x^('^+^)), 
where A > is introduced. By Lemma [24l1iil) 

dim^R/VIiGx) < (2(i+l)" 
for A > 0, where V/(G) is the gradient ideal (^, . . . , |^). Let 

Xx = V{VI{Gx)) C M". 

Notice that \Xx\ < dimkR/VI{Gx) = {2d + 1)". We wish to reason about the "limit" L = 
lim^-i-o Xx- To make this more precise we define 

L = {x G M" I Ve > 03Ae > : B{x,e)r\Xx / 0, for every AwithO < A < A^}. 

It is rather difficult to decide if a given point is in L. For one thing the polynomial system may 
have several bigger components not related to the limit. In our case, we have the following result, 
which allows us to recover the real solution if we solve the system in the limit, that is as A — )• 0. 

Proposition 25. If z = {zi, . . . , Zn) is an isolated solution of (S), eq. then z €z L. 

Proof. By the isolation of z there exists 6 > 0, such that G{x) > for every x G B{z,5) \ {z}. 
Therefore m = min{G(x) | x G dB{z,6)} > 0. Pick A > so that 

Gx{z) = X{zl^''+'^+--- + zl(^+'^)<m 

Since 

m < min{GA(x) | x G dB{z,5)}, 
we know that the minimum of Gx on B{z, 6) is attained in B{z, 6)° . Thus, Xx D B{z, 5) 7^ 0. □ 
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5.3 Proof of Theorem [H 

For the proof of Theorem 1231 we additionally need the following fundamental bounds. 
Lemma 26. |3, [23 Let f £ Z[x] of degree d, then for any non-zero root 7 it holds 

(2||/||oc)-^<|7l<2||/||oc . 
If sep / is the separation bound, that is the minimum distance between the roots, then 

sep/ = min|7, - jj\ > . 

Proof of Theorem [23[ Let 7j = (7j,i,-'' )7i,n) be isolated real solutions of the system (S). As 
above, we consider 

G{xi, ...,Xn)= gi{xi, . . . ,Xnf H \- 9m{xi, . . . ,X„)^ 

and its pertubation 

GA(x) = G(x) + A(xf+^) + ... + x^('^+i)), 
Form the system of partial derivatives 

/, = G, + (2d + 2)Axf+i , 

where Gi(x) = -^j-^- We homogenize the polynomials using a new variable xq and introduce the 
linear form uqXq + • • • + n„Xn specialized to the Ith coordinate as describe above. That is we add 
the polynomial 

/o = uxo - Xi 

Let the resulting system be (Sq)- 

For a polynomial /, let £ (/) be the maximum coefficient bitsize, that is C (/) = [Ig ||/||ool- We 
have deg(G) < 2d and C (G) < 2t + 2nlg{dm). Write Gj on the form 

2d-l 

Gi(x) = Cijx"'-' G Z[x], 
i=i 

where 1 < i < n, and let c be the set of all coefficients Cij. It holds that deg(Gj) = 2d — 1, 
||G,||oo <2d||G||oo. 

Let D = {2d+ 1)" and Di = {2d + For the system (Sq) we consider the multivariate 

resultant R in the variables xq, xi, . . . , x„. It is a polynomial in the coefficients of G, u and A, that 
is i? G (^[c, A] )[?/], [15]. It has degree Di in the coefficients of Gj, where 1 < i < n, and degree D 
in the coefficients of Go, which are 1 and u. To be more specific, R is of the form 

R = ... + g,u'^C?lc^l...C^^,+..., 

where gk G and c^^ is of the form X^Ci^k^^~^ where the second factor corresponds to a monomial 
in the coefficients Cjj, of total degree Di — fi, for some fi smaller than Di. 

The lowest-degree nonzero coefficient of R, Ru, seen as univariate polynomial in A, is a projection 
operator: it vanishes on the projection of any 0-dimensional component of the algebraic set defined 



22 



by (So) [HI mi [11]. In our case the l-th coordinates of the isolated solutions of ([6|) are among the 
roots of Ru- 

It holds that £ Z[c][ti], and deg{Ru) < D. Notice that the bound on the degree of Ru, that 
is D = (2d+ 1)", is also an upper bound on the algebraic degree on the coordinates of the solutions 
of ([2|). Which proves the first assertion of the theorem. 

To compute the bounds on the roots of Ru, and thus bounds on the isolated solutions of ([6]), we 
should bound the magnitude of its coefficients. For the latter, it suffices to bound the coefficients 
of R. Let 

||i?||oo < max|£ifcCi,fe^^ C2,fc^^ ■ ■ ■ Cn^k'^^l <inax\Qk\ ■ max\ci^k^^ C2,k^^ ■ ■ ■ Cn,k^^\ = h ■ C . 

k k k 

To bound Qk we need a bound on the number of integer points of the Newton polygons of |35j . 
which we denote by {#Qi)- We refer to |18] for details. For all k we have 

n 

\Qk\<h={n+ if lli^Q^)''^ < . 
1=1 

Moreover 

n 

max|ci,fc^i C2,fe^i •••Cn.fc^M =nil^^ll- ^ (^l|G'||oor''^ = C . 

i=l 

Hence 

\\Ru\\oo < \\R\\oo <hC = (2L'd||G||)"^i < 22"(^+2"ig{'^"^))(2rf+i)"-i . 
Using Cauchy's bound (Lemma [26]) any of the non-zero roots 7j^j of Ru satisfies 

> \\Ru\\^' > (hC)-^ > 2-2"(-+2nlg(<^™))(2rf+l)"-l . 

Notice that the defining polynomial of jj^i is the square- free part of Ru, which has bitsize at most 
2n(T-h2nlg((im))(2d-M)"-i + (2d -M)'"'-^ 21g(2(i < 2n{T + Anlg{dm)){2d + l)""'^ . 
To bound the minimum distance between the isolated roots of (S), we notice that 

^/n sep{T.) > ^ min||7i - 7j||oo > min||7i - 7j||2 > min|7j,^ - 7j,^|, 
^+3 ^rj 

for any 1 < £ < n and where the last minimum is considered over all ^i^g ^ jj^i. 
Using the separation bound for univariate polynomials (Lemma [26]), we get 

sep(i?„) = min|7i,^ - 7j- ^1 > > (/D||i?„||oo)^~^, 

and so 

sepiRu) = min|7,. - 7,,d > 2-=''^(^+2nig(.„.))(2.+i)--i _ 

Finally 

sep(S) > sep(i?,)/V^ > 2-3«(-+2nig{*«))(2rf+i)2"-^-|ig(n) _ 
This completes the proof. □ 
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Better bounds should be possible for the algebraic degree of Theorem [23l based for example on 
Oleinik-Petrovskii, Milnor-Thom's [31' '37] bound for the sum of Betti numbers of a set of real zeros 
of a polynomial system, or on improved estimates by Basu [2j on individual Betti numbers; see also 
[5]. This should lead to improved separation bounds, if used in conjunction with neat deformation 
techniques and bounds on parametric Grobner basis, e.g. O [27], and/or bounds based on the 
Generalized Characteristic Polynomial and sparse multivariate resultants [10^ I18j. Nevertheless, it 
is not possible to beat the single exponential nature of the bound, and only improvements in the 
constants involved are expected. 

6 Degree lower bounds for values of Shapley games 

In this section we give a construction of a Shapley game F^v,™ with + 1 positions each having at 
most m actions, such that the algebraic degree of the value of one of the positions is at least . 

Previously, Etessami and Yannakakis |20j gave a reduction from the so-called square-root sum 
problem to the quantitative decision problem of Shapley games. In fact from this reduction one 
can obtain a Shapley game with positions where the algebraic degree of the value of one of the 
positions is 2^(^). 

Our results below can be viewed as a considerable extension of this, showing how the number 
of actions can affect the algebraic degree. Comparing with the upper bound mP^^^ shows that our 
result is close to optimal. The idea of the game we construct is very simple. The game consists 
of a dummy game position that just gives rise to a probability distribution over the remaining 
N positions. Each of the remaining N positions are by themselves independent Shapley games 
consisting of a single position with m actions. We will construct these A'^ games in such a way 
that their values are independent algebraic numbers each of degree m. Then a suitable linear 
combination of these, corresponding to the probability distribution, will cause the dummy position 
to have a value which is an algebraic number of degree . 

Actually implementing this approach seems to bring significant challenges when m > 2. However 
using the powerful Hilbert's irreducibility theorem we are able to give a simple existence proof of 
a Shapley game with the properties as stated above. Next, we will also give an explicit proof of 
existence using elementary but more involved arguments. 

6.1 The single position game 

Let ai,...,am > be arbitrary positive numbers and < (3 < 1. Consider the Shapley game 
r(a, /?) consisting of a single position where each player has m actions, and the payoffs are an = ai 
and aij = for i / j, and transition probabilities p\l = f3 and pjj = for i 7^ j. Thus to T{a,f3) 
corresponds the parameterized matrix game given by the diagonal matrix diag{ai+f3v, . . . , am+f3v). 

By Theorem HI and since the game is given by a diagonal matrix with strictly positive entries 
on the diagonal, we find that the value of the game v satisfies the equation 



More precisely, consider a diagonal matrix game diag(ai, . . . ,0^) with strictly positive entries 
ai, . . . , Om > on the diagonal, and let p and q be optimal strategies for the row and column player, 
respectively, and let v > be the value of the game. Firstly, all pi > as otherwise the column 



m 




(7) 
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player could ensure payoff by playing strategy i. Thus v = UiQi for all i, and hence also qi > for 
all i. But then similarly we have v = aiPi for all i. Rearranging to = v/ui and doing summation 
over i gives the claimed equation. 

Define the polynomial fm{v) = 112=1 + Z^^)- Then fm{v) = (3 X^^i Y[j^i{<^j + l^v). Multi- 
plying by fm{v) on both sides of equation [7] we obtain the following. 

m ^ 

fmiv) =vY^ JJ(aj + I3jv) = -^vf'^iv) . 

In the following we will specialize /3 = 1/c, for some c > 1. We then obtain that f is a root in the 
univariate polynomial 

Fm{v) = fmiv) - Cvf^{v) . 

6.2 Existence using Hilbert's irreducibility theorem 

We next present the simple existence proof using (a version) of Hilbert's irreducibility theorem. 
Lemma 27. // c > 1 is rational, then 

is irreducible as a multivariate polynomial in v,ai, . . . , am- 

Proof. This uses induction on m. For m = 1 we have Fi = (1 + l/c)v + af which is irreducible in 
Q[v,ai]. The induction step proceeds as follows. 

Fm = fm- Cvf^^ = (a^ + ^v)fm-l - CV-^ (^{a"^ + ^v)fm-l^ 
= /m-lttm + -Vfm-l - Cv{-fm-l + (a^ + 

c c c 

f 2 , ^ f t 2 fl 2 tl 

C 

= ifm-l - Cvf'^__^)al, + V{{- - l)fm-l - Vfln-l) 

C 

= Fm-ia^ + V ((1/c - - t^/m-l) • 

If Fm-1 is associated to F := — l)fm-i — 'w/m-i the polynomial ring Q[v, qi, . . . , am-i], then 
we would have (^ — l)Fm-i = F leading to the contradiction (^ — l)c = 1. Since Fm-i is irreducible 
by induction, it follows that 

gcd(Fm-i, (1/c - l)/m-i - vf!n-i) = 1 

and therefore that 

Fm = Fm-iaii + V{{l/C - l)fm-l " vf'm-l) ^ Q[v,ai, . . . , am~l][am] 

is irreducible. □ 
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We recall the following version of Hilbert's irreducibility theorem (see [22j . Corollary 11.7) 
sufficient for our purposes. 

Theorem 28 (Hilbert). Let K he a finite extension field o/Q and f £ K[x,yi, . . . , y„] an irreducible 
polynomial. Then there exists (ai, . . . , a„) G Q"', such that 

f{x,ai, . . . ,a„) G K[x] 

is an irreducible polynomial. 

We are now in position to show existence of the Shapley game r7v,m- 

Theorem 29. For any N,m > 1 there exists a Shapley game with + 1 positions each having m 
actions for each player, such that the value position N + 1 in the game is an algebraic number of 
degree . 

Proof. We shall construct the first N positions as independent Shapley games described as above. 
For the base case ol N = 1, using Lemma [27] we simply invoke Theorem [28] on the polynomial 
Fm{v, of, ... , a^) with c = 2, say. This gives a specialization of ai, . . . , am G Q such that the 
value of the game r((af , . . . , a^)) 1/2) is an algebraic number vi of degree m. 

Now assume by induction that we have constructed — 1 single-position Shapley games with 
values wi, . . . , tiAT-i together with positive integer coefficients fci, . . . , kN-i such that v' = kivi + 
■■■ + k]\i-iVN-i is an algebraic number of degree m^~^. Invoke Theorem 1281 on the polynomial 
Fm{v, of, ... , a^) as before, but now over the extension field Q{v'). This again gives a specialization 
of ai, . . . , am G Q such that the value of the game T{{a1, . . . , a'^), 1/2) is an algebraic number vn 
of degree m, but now over Q{v'). We may now find a positive integer k such that v' + kj^v^ is an 
algebraic number of degree m^~^m = over Q. 

Now we may construct the + 1 position game as follows. Let K = ki + ■ ■ ■ + k^. In position 
+ 1, regardless of the players actions, with probability 1/2 the game ends, and with probability 
l/2A:j the play proceeds in position i. No payoff is awarded. Clearly the value of position A^ + 1 is 
exactly {kivi + • • • + kNVN)/2 and is thus an algebraic number of degree . □ 

6.3 An explicit specialization 

Write Ek{a) = Ek{ai, . . . , am) for the kih. elementary symmetric polynomial in ai, . . . , Om i-e. 

Ek{oi)= ^ ail 

for 1 < k < m. For notational convenience we define Eo{a) = 1. We have not been able to find a 
reference in the literature for the following lemma. For a complete factorization of Sk{x) we refer 
to Lemma [35] 

Lemma 30. Let Sk{x) = Ek{l,x, . . . ,x™^^), where x is a variable. Then 

gcd(S'i(x), . . . , Sm~l{x)) = ^m{x), 

where ^m is the m-th cyclotomic polynomial. 
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Proof. Define 

fit, x) = it- -x){t-x^)---{t- 

= t^- Si{x)t^-^ + • • • + {-ir~^s^.i{x)t + {-irsm{x). 

If is a primitive m-th root of unity, then f{t,^) = — 1 and therefore Sj{S,) = for j = 
1, . . . , m — 1. If ^ is not a primitive m-th root of unity, then f{t, ^) has multiple roots showing that 
Sj{(^) / for some j = 1, . . . , m — 1. Thus the greatest common divisor of Si{x), . . . , Sm~i{x) is 
the product of (x — ^), where runs through the primitive m-th roots of unity. This polynomial is 
precisely D 

We now derive the following formula for giving the coefficients explicitly. 

Lemma 31. 

m 

Fm{v) = Y,Em-k{a){l - ck){v/c)'' . 



k=0 



Proof. First we have 



and thus 



fm{v) = Yliai + v/c) = Em-k{a){v/cf , 



i=l k=0 



fU{v) = Y.E^.u{a)kv^-\l/cf . 

k=0 

We can then conclude 

m m 

Fmiv) = Y,E^_k{a){{v/cf - cv{kv^-\l/cf)) =Y,Em-k{a){{l - ck){v/cf 

k=0 k=0 



□ 



Lemma 32. The polynomial 

m 

F{v) = Em-k{a){l - ck)c^-^v^ = c^F^iv) , (8) 

A:=0 

is irreducible for an infinite number of specializations of a and c. 

Proof. We consider the polynomial G{v) = v"^F{l/v). Obviously F{v) is irreducible if and only if 
G{v) is. Moreover, we let = x*^^, for 1 < i < m, for x G Z+ to be specified in the sequel. By 
abuse of notation we also denote this polynomial as G{v), which is 

m 

G{v) = ^ (1 - (m - k)c)c'^ ■ Sk{x) ■ v^. 

k=0 

By Lemma [301 all the coefficients of G{v), except the leading and the trailing coefficient, have 
^m{x) as a common divisor. Now specialize to x = im with ^ 2> and i G N. Let p be a prime 
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divisor in ^rnix)- Then p\ x. There exists infinitely many c G N, such that p \ 1 — mc, since p\ m. 
By possibly replacing c by c + p we may assume that 1 — mc = hp, where p\h. 

With this choice of c, p f c and p divides the constant term of G{v) precisely once. Moreover, p 
is not a divisor of the leading coefficient of G{v), which is 2;'"(™~^)/^c™'. 

We conclude using Eisenstein's criterion (Theorem [36]) all but that G{v), and hence F{v), is 
irreducible for this class of (infinite) specializations. □ 

Lemma 33. Let Fj{v) as in i.e. 

m 

= ^rn-k{aij, . . . , a^j){l - Cjk)cJ-^v^ , (9) 

A:=0 

where 1 < j < n. Let ij be any root of Fj[v), then there is an infinite number of specializations of 
Oij and Cj, such that 

[Q(7i,---,7n) :Q] =m". 



Proof. We consider the specialization , where 1 < i < m, 1 < j < n, for a x G Z+ to be 

specified in the sequel. 

As before, we let Sk{m) = Ek{l,x, . . . ,x'"""'^), and we perform the transformations Gj{v) = 
v'^Fj{l/v), where 1 < j < n, and we obtain the polynomials 

m 

Gj{v) = ^ (1 - (m - k)cj) ■ Sk{m) ■ . 

k=0 

We pick a X E Z+ so that ^mix) has at least n distinct prime factors, that are 

relative prime to m. For such a procedure we refer to Lemma [39l For 1 < j < n, we choose cj so 
that the equation 1 — mcj = bjpj is satisfied for an integer bj , and pj is not a divisor of bj . 

All, but the leading and trailing, coefficients of Gj have ^mix) as their common GCD, according 
to Lemma [30l and hence they are mod pj, for 1 < j < n. 

To summarize, for the n primes, pj, it holds: 

• None of them divides any of the leading coefficients of Gj . 

• For each j, pj divides the constant term of Gj{v), pj does not, and pj does not divide any of 
the constant term of the other polynomials. 

• For all Gj, all the coefficients but the leading and the constant term, are divided by pj. 
Hence, according to Theorem Wf\ if 7j is a root of Gj, then 

[Q(7i,---,7n) :Q] =m". 

□ 

Lemma 34. Let Fj{v) as in and let ij be any root of Fj(v), then there is an infinite number 
of specializations of Oij and Cj, such that for all but a finite number of kj S Q, it holds 

[Q(7i + k272 + ■■■ + knln) : Q] = m" , 

where 1 < j < n. 
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Proof. The existence of ki is guaranteed from the existence of primitive element 08]. That is for 
ah but a finite number of values of kj G Q it holds Q(7i, . . . , 7n) = Q(7i + k2j2 + • • • + k^jn), and 
from Lemma [33] we conclude for the degree. 

To find explicit values for ki we modify slightly the proof of the existence of primitive element 
|38j . Let jji be all the roots of Fj{v), where 1 < i < m. It is without loss of generality to assume 
that 7, = 7ji. 

Let /32 = Ji + k2j2- For Q(/32) = Q(7i,72) to hold, it should be 

, , 711 - Hi 

12£ — 721 

for all 1 < i < m and 1 < i < m, and hence there are at most (m — l)m forbidden values for k2- 
This means that there is at least one positive integer between and mP, that we can assign k2 to, 
so that Q(7i + k2j2) = Q(/32) = Q(7i,72). 

If we let (33 = (32 + k^js = 71 + k2j2 + ^373, then for Q(/33) = Q(7i, 72, 73) to hold, it should be 

, , (321 - (32i (711 - 7i,n) + ^2(721 - 72,^2) 

1^3 T — ) 

73^ - 731 73^ - 731 

for all 1 < i < ?Ti^, \ < i\ < m., \ < 12 < m and 1 < £ < m. Hence there are at most {m — l)m?) 
forbidden values for k^, and so there at least two integers between and (m — l)m + (m — l)m^ = 
(m — l)^m, that ^2 and ks could be assign to, so that QiPs) = Q(7i,72>73)- 
We continue similarly, and eventually we let 

(3 = (3n = (3n-l + knjn = 7l + ^272 H h knjn ■ 

We consider 

(3n-l,l - (3n-l,i (711 - 7l,n) + ^2(721 - 72,^2) H ^ ^n-l(7n~l,l " 7n-l,i„_i) 



kn 7^ 



7n£ - 7nl 7nf " 7nl 



for all 1 < i < m" ^, 1 < V < w-, and 1 < £ < m. There are at m" ^(m — 1) forbidden values for 
kn- 

Overall, there is at least n — 1 integers between and m{m'"'^^ — 1) ~ m" that k2, ■ ■ ■ , kn, could 
be assigned to, so that 



yi, • • • , 7n) = Q(7l + ^272 H \- knJn) ■ 

Using the previous lemma we conclude for the degree. □ 

Now combining Lemma [33] and Lemma [Mj we can immediately turn the proof of Theorem [29] 
into an explicit proof of existence. 

6.3.1 Auxiliary results 

A similar lemma appears in [36]. 



Lemma 35. Let he the elementary symmetric polynomials in n variables ai,...,a„, where 
0<k<n, that is Sfc(ai, . . . , a„) = Ei<n<i2<-<ife<n «ii • • • • 
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Let Sk{n) = Ek{l,x, . . . where 1 < k < n, then it holds that 

5,(n) = x'^('=-i)/2n(x"-^+i-l)/ci>{' ' 



ii|n i2|(n-l) ifc|(n-fc+l) \^=1 > 

where ^£ = ^£{x) is the i-th cyclotomic polynomial. 



Proof. We prove the formula using double induction. 

Evidently the formula holds for S'i(l), and we can easily prove that it holds for Si{n), for every 
n. It also holds for Sk{k) for all k. 

For the definition of the elementary symmetric polynomials it holds that Ek{ai, . . . ,an) = 
Ek{ai, ... , On-i) + anEk-i{ai, . . . , a„_i), and hence Skin) = Skin - 1) + x""^5fc_(n - 1). We 
assume that the formula holds for Skin — 1) and 5fc_i(n — 1) and we prove that it holds for Skin). 

Skin) = Skin - 1) + x"-i5,,„i(n - 1) 

^ 71 — A 1 ^ Tl — LL 1 

_ fc(fc-l)/2 TT X - 1 n~l . (fc-l){fc-2)/2 TT ^ ~ ^ 

li ^Lfc/AJ +^ ^ ii ^L(fc-i)/H 

A=l M=l ^/^ 

- ^fc(fc-l)/2 nA=l(^" ^ ~ 1) ^ n-fc _ 1 , n-k TT ^ n 

11a=i^a 

^ ^A:(fc-l)/2 nA=l(^"~^-l) (^n-fc _ ^ ^ ^n-k^^k _ 



A 



pfc(fc-i)/2 j-j- (a; - i) 



A=l *A 

The formula follows if we also consider that x"" — 1 = n^|n '-' 

Theorem 36 (Eisenstein's criterion). Let fix) = anx"^ + ■ ■ • + aix + ao be a polynomial with integer 
coefficients. Let p be a prime such that (i) p divides each Oj for 1 < i < n, (ii) p does not divide 
an, and (Hi) p^ does not divide oq, then f is irreducible over the rational numbers. 

Theorem 37 (Generalized Eisenstein's criterion). \2Sl Let 

fiix) = x"' + ai,ix"'-^ + • • • + ai,„,, 

where 1 < i < s and all the coefficients of all the polynomials belong to O. 

If there exists non-archimedean valuations vi,V2, ■ ■ ■ ,Vs of K such that t(vi) = pi, . . . , t(fs) = Ps 
are distinct primes, and that 

Viiai^m) = 1, Viiaj^rij) = 0, and Viiok^r) > 1, 

where 1 < i,j,k < s, i ^ j, I < r < nk — I, then, for any choice of the roots of fiix), say ji, 
1 < i < s, we have 

[Kiji, . . . ,js) ■■ K] = nin2 • • • n^. 
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Remark 38 (A note on the leading coefficient). Theorem 37 is a generalization of Eisenstein's 
criterion. It assumes that all the polynomials are monic. However, it is without loss of generality 
to assume that the corresponding primes do not divide the leading coefficients. This is so because 
we can transform a non-monic polynomial to a monic one, such that the theorem holds, as follows: 
Given a polynomial 

g{x) = a„x" + an-ix^~^ H + aix + ao, 

where a„ 7^ 1, we multiply all the coefficients by a^"^ then 

gi{x) = a'^x^ + an-ia^'^x^'^ H h aJJ'-^aix + aJJ'-^ao- 

// we set y = a^x, then 

92{y) = 2/" + an-iy"~^ + • • • + <"^aiy + a^^^^oo = + c„_iy"~^ + ■ ■ ■ + ciy + cq 

where in order Eisenstein's criterion, or its generalization, to hold, it suffices a prime p not to 
divide the leading coefficient of g, an- If the roots 0/(72 are Pi, then the roots of g are 7^ = Pi/on- 

Lemma 39. Let f € Z[x] be a non-constant polynomial and a an integer, such that f{a) has the 
prime divisors pi, . . . ,pk with k > 1. Then there exists an integer b, such that f(b) is divisible by 
at least k + 1 primes. 

Proof. We consider the polynomial 

g{x) := f{f{afx + a) = f{a) + f{afxh{x) = /(a)(l + f{a)xh{x)), 

where h{x) E is non-zero. Notice that g{x) is divisible hy pi, ... ,pk for every x G Z. Now the 
result follows, since a prime dividing 1 + f{a)xh{x) for x G Z cannot be among pi, . . . ,pk. □ 



7 Upper bounds for value and strategy iteration for concurrent 
reachability games 

In this section we explain how the techniques of Section H] as used for Everett games, also yields an 

improved analysis of the strategy improvement algorithm for concurrent reachability games. 

Let r be an Everett game, with N positions. Assume that in position k, the two players have 
^ '"T' and rif. < m actions available. Assume further that all payoffs and probabilities in T are 

rational numbers with numerators and denominators of bitsize at most r. Further, let o" be a fixed 

positive integer. 

From Lemma [14] we get the following statement. 

Lemma 40. There is a quantifier free formula with 2N free variables vi and V2 that expresses 
vi G Ci{T),V2 G C2{r), and \\vi - V2f < 2^'^. 

The formula uses at most {2N + 1) + 2(m + 2) XlfcLi ("^m™*) different polynomials, each of 
degree at most m + 2 and having coefficients of bitsize at most max{a,2{N + l)(m + 2)r), where 
m = max^-^ (min(nj!c, m^)). 

Theorem 41. Let T and a be as above. Let e = 2^°". Then there exists e-optimal strategy of 
r where each probability is a real algebraic number, defined by a polynomial of degree vrP^^'> and 
maximum coefficient bitsize max(cr, r)m'^(^) . 
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Proof. We use Theorem 13.11 of [3] to find a univariate representation of the pair {vi, V2) satisfying 
the formula from Lemma HOl That is we have polynomials f,gQ,... ,g2N, with / and go coprime, 
such that the points (^1,^2) are given as {gi{t)/ go{t), . . . , g2N{t)/ goit)), where t is a root of /. 
These polynomials are of degree vnP^^^ and their maximum coefficient bitsize is max(cr, r)m*^^^-'. 

Now consider the matrix games A^{vi) for all positions k. We find optimal strategies p^, ■ ■ ■ 
that correspond to basic feasible solutions of the linear program LP ([T|). Notice that the elements 
of these matrix games are rational polynomial functions in go, . . . ,g]\j. By Lemma [3] we have 

= det((M^;, )j)/ det(M^j.) for some potential basis sets , . . . ,B^. Using Lemma [TOl each 
is a rational polynomial function in go, . . . ,gN of degree mP^^"^ and maximum coefficient bitsize 
max(cr, r)m*^(^\ Substituting the root t of f using Lemma [15] we obtain the statement. □ 

Using Lemma [26l we deduce: 

Corollary 42. An Everett game with coefficient bitsize bounded by r has a optimal strategy 
where the probabilities are either zero or bounded from below by 2~™^^(°'''^)"^'^*'^'' . 

We now apply Lemma 3 of Hansen, Ibsen- Jensen and Miltersen [24j and conclude that value 
iteration and strategy iteration on a deterministic concurrent reachability game (where r = 0(1)) 
will compute an e-optimal strategy after at most (1)™°*^' iterations. This matches the lower bound 
obtained by Hansen, Ibsen- Jensen and Miltersen |24j . 
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